Association of CYP19A1 rs28757157 polymorphism with lung cancer risk in the Chinese Han population

Background Lung cancer is the leading cause of cancer death globally. Recent studies have revealed that CYP19A1 gene plays a crucial role in cancer initiation and development. The aim of this study was to assess the association of CYP19A1 genetic polymorphisms with the risk of lung cancer in the Chinese Han population. Methods This study randomly recruited 489 lung cancer patients and 467 healthy controls. The genotypes of four single nucleotide polymorphisms (SNPs) of the CYP19A1 gene were identified by the Agena MassARRY technique. Genetic model analysis was used to assess the association between genetic variations and lung cancer risk. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated to evaluate the effect of four selected SNPs on lung cancer risk. Results CYP19A1 rs28757157 might contribute to an increased risk of lung cancer (p = 0.025, OR = 1.30, 95% CI 1.03–1.64). In stratified analysis, rs28757157 was associated with an increased cancer risk in the population aged under 60 years, females, smokers, and drinkers. Besides, rs3751592 and rs59429575 were also identified as risk biomarkers in the population under 60 years and drinkers. Meanwhile, a relationship between an enhanced risk of squamous cell carcinoma and rs28757157 was found, while the rs3751592 CC genotype was identified as a risk factor for lung adenocarcinoma development. Conclusions This study has identified revealed that the three SNPs (rs28757157, rs3751592, and rs59429575) of CYP19A1 are associated with lung cancer in the Chinese Han population. These findings will provide theoretical support for further functional studies of CYP19A1 in lung cancer. Supplementary Information The online version contains supplementary material available at 10.1186/s12957-022-02868-9.

Lung cancer is the leading cause of cancer mortality worldwide, in which women are less than half as likely to die of lung cancer as men [1]. Lung cancer in non-smokers tends to be more common in females [4]. These findings have drawn attention to investigate the effects of estrogen on lung cancer risk. It has been reported that both estrogen receptor and aromatase are present in human lung tumors [7,8]. These results suggest that estrogen may play a role in the biological behavior of human lung cancer.
Cytochrome p450 (CYP450) enzymes are pivotal for biological homeostasis. CYP450 enzymes also play a key role in the metabolism of many endogenous substrates and exogenous carcinogens as well as aromatic and heterocyclic amines. They then covalently combine with DNA to form DNA adducts, which in turn cause cancer [9,10]. The CYP450 family 19, subfamily A, and polypeptide 1 (CYP19A1) gene encodes aromatase, which is a member of the CYP450 superfamily and a key enzyme in oestradiol biosynthesis. Mutations in the CYP19A1 gene can result in either increased or decreased aromatase activity [11], and aromatase plays an important role in lung cancer [12]. This suggests that CYP19A1 genetic variations may indirectly affect the occurrence of lung cancer, but the exact mechanism is unclear. At the same time, many works of literature have reported an inseparable relationship between the genetic variant of CYP19A1 and lung-related diseases, including lung cancer [13]. Previously, CYP19A1 rs3764221 has been studied to be significantly associated with the multicentric development of lung adenocarcinomas [13]. Moreover, CYP19A1 rs727479 is also significantly associated with the incidence of lung cancer [14]. However, there are still a large number of single-nucleotide polymorphisms (SNPs) in CYP19A1 whose association with lung cancer risk has not been reported.

Participants
In order to ensure the accuracy and credibility of the research results, we used G * Power 3.1.9.7 software (https:// stats. idre. ucla. edu/ other/ gpower/) to estimate the sample size before we planned to conduct this study. The specific parameters we set were as follows: effect size d = 0.2; α error probability = 0.05; and power (1-β error probability) = 80%. This calculation produced a sample of at least 412 cases and 412 controls. Here, we recruited 489 cases and 467 controls in this study, larger than the total sample size recommended by G * Power. In the study, we recruited 489 pathologically confirmed lung cancer patients from Xuanwei City, Yunnan. All cases were diagnosed as lung cancer by histological examination according to the World Health Organization tumor classification system and confirmed by two independent pathologists. The exclusion criteria for patients were as follows: (1) history of other tumors; (2) family history of lung cancer; (3) chemotherapy or radiotherapy treatment; (4) hypertension, diabetes mellitus, or any endocrine metabolic diseases; and (5) other lung diseases. The control group was composed of 467 healthy subjects who were volunteer blood donors from the same city as the cases. Controls with a history of any cancers, other endocrine metabolic diseases, or other lung diseases should be excluded. Eligible study participants were screened by completing a specialized questionnaire, which included demographic characteristics, disease history, lung status, and family history of other types of tumors. All participants were of Chinese Han ancestry from northwest China. The research protocol according to the Helsinki Declaration was conducted with the approval of the First People's Hospital of Yunnan Province Ethics Committee, and written informed consent from all subjects was attained.

SNP selection
Four SNPs (rs28757157 (NG_007982.1:g.90395G > C), rs3751592 (NG_007982.1:g.29218A > G), rs3751591 (NG_007982.1:g.29086 T > C), and rs59429575 (NG_ 007982.1:g.28719G > A)) in CYP19A1 were randomly selected based on the following: (1) the variations of CYP19A1 through the e!GRCh37 (http:// asia. ensem bl. org/ Homo_ sapie ns/ Info/ Index) database in the CHB and CHS population; (2) Hardy-Weinberg Equilibrium (HWE) > 0.01, minor allele frequency (MAF) > 0.05, and min genotype > 75% using Haploview software; (3) combined MassARRAY primer design software, HWE > 0.05, MAF > 0.05, and the call rate > 95% in our study population; and (4) a MAF > 0.05 based on the database of 1000 genome (http:// www. inter natio nalge nome. org/) and dbSNP (http:// www. bioin fo. org. cn) databases. . First, a locus-specific PCR reaction was performed, followed by a locus-specific primer extension reaction (iPLEX assay), in which oligonucleotide primers were annealed directly upstream of the polymorphism of genotyping. In the iPLEX assay, primers and amplified target DNA were incubated with a large number of modified dideoxynucleotide terminators. The primer extension is made according to the sequence of mutation sites and is a single complementary mass-modifying base. The quality of the extended primers was determined by MALDI-TOF mass spectrometry. The quality of the primers indicates the sequence, therefore, the allele present at the polymorphic locus of interest. Using MALDI-TOF mass spectrometry, SNP alleles could be identified with different qualities of extended primers [17,18]. Data processing was carried out with Agena Bioscience TYPER software, version 4.0 (Agena Bioscience, San Diego, CA, USA) [19]. A 10% randomly selected samples were re-analyzed with 100% consistency for quality control.

Statistical analysis and bioinformatics analysis
SPSS software (SPSS 22.0, USA) and Microsoft Excel were used for statistical analysis. Continuous variables were evaluated for normality using the Kolmogorov-Smirnov test. Continuous variables (age and body mass index (BMI)) with non-normal distribution as median with interquartile range (IQR) were compared using the Mann-Whitney U test. The differences in gender, smoking, and drinking distribution between the case and control groups were determined by the χ 2 test. The χ 2 test was used to determine whether individual polymorphisms were in HWE. In addition, χ 2 test was used to detect the difference in allele and genotype frequencies between cases and controls. The SNPStats software (https:// www. snpst ats. net/ start. htm?q= snpst ats/ start. htm) was adopted to define the relationship between polymorphisms and the risk of lung cancer in the Chinese Han population in different genetic model analyses (genotype, dominant, recessive, and additive models).
In multifactor dimensionality reduction (MDR) analysis, multilocus genotypes were classified into high-and low-risk groups. With this method, multidimensional genotype variables were transformed into single-dimensional ones [23]. In order to explore the association of high-order SNP-SNP interactions with the susceptibility to lung cancer, we used the MDR method including cross-validation and permutation-test procedures. Cross-validation could minimize the possibility of falsepositive results by dividing the data into a training set and a testing set and repeating each part of the data. Balanced accuracy was used to assess model quality. The overall best model with the greatest accuracy in the testing data was selected. The cross-validation consistency (CVC) provided a list of the number of cross-validation intervals in which a particular model was found. The permutation testing indicated the cross-validation consistency and the prediction error are statistically significant at the 0.001 level. This indicates that among 1000 permuted datasets, no best models had a cross-validation consistency or a prediction error of the same magnitude as was observed for the original dataset. Higher numbers indicated more robust results. A permutation test was used to assess the significance of the best model [24]. The optimal CYP19A1 SNP-SNP interaction model for lung cancer susceptibility was performed through MDR 3.0.2 software.

Study population
In this study, 489 lung cancer patients (337 males and 152 females) was involved as well as 467 healthy controls (326 males and 141 females). The median (IQR) ages of cases and controls were 61.00 (56.00-65.00) years old and 61.00 (55.00-65.00) years old, respectively (Table 1). In addition, the characteristics of the study population were collected for subsequent studies, including BMI, smoking, and drinking history, pathological type, pathological stage, and lymph node metastasis (LNM). There was no significant difference in age, gender, BMI, smoking, and drinking between the case group and the control group (p > 0.05).

Genetic analyses of the selected SNPs with the risk of lung cancer
Four SNPs in CYP19A1 were genotyped among subjects. The representative spectrum of each SNP is displayed in Supplemental Fig. 1. The basic information about all candidate SNPs is listed in Table 2. All SNPs are located on chromosome 15 and in the different positions of the CYP19A1 gene. The deviation of Hardy-Weinberg equilibrium in the control group was evaluated, and the results showed that the candidate SNPs all met the expected p value (p > 0.05), and satisfied further study. In addition, under the allele model, there was a significant difference in the allele distribution of rs28757157 between the lung cancer cases (0.215) and healthy controls (0.174), and rs28757157 T allele might contribute to an increased risk of lung cancer (p = 0.025, OR = 1.30, 95% CI 1.03-1.64). Functional prediction of SNPs was conducted in HaploReg v4.1 and RegulomeDB databases to explore their regulatory effect. The results showed that four SNPs exhibited potential biological functions in gene regulation. Based on QTLbase database, the genotypes of CYP19A1 rs28757157 (p = 6.610e − 5) were related to the mRNA expression of CYP19A1 in the lungs (Fig. 1). Under four genetic models, the relationship between CYP19A1 polymorphisms and the risk of lung cancer is listed in Table 3. Our results revealed an association between rs28757157 and increased risk of lung cancer in the genotype (p = 0.034, OR = 1.43, 95% CI 1.09-1.88), dominant (p = 0.011, OR = 1.41, 95% CI 1.08-1.85), and additive (p = 0.021, OR = 1.34, 95% CI 1.04-1.71) models.

Stratification analyses by demographic characteristics
In addition, we conducted a stratified analysis by demographic characteristics (age, gender, BMI, smoking, and drinking) to explore the risk effects of these SNPs in specific groups, as shown in Table 4 were related to an increased risk of lung cancer in drinkers, whereas rs3751592 (p = 0.023, OR = 3.31) was identified as a genetic risk factor for lung cancer susceptibility in non-drinkers. However, no significant correlation between CYP19A1 polymorphisms and lung cancer risk after stratification by BMI was found.

Stratification analyses by clinical characteristics
As listed in Table 5, the correlation between CYP19A1 polymorphisms and lung cancer risk in the different groups (tumor type, LNM, and stage) was assessed. The stratified analysis by tumor type demonstrated a  The two SNP interactions associated with lung cancer susceptibility.

MDR analysis
The association between higher-order SNP-SNP interactions and the predisposition to lung cancer was examined by MDR, as summarized in Fig. 2 and Table 7. Figure 1 presented that these four polymorphisms exhibited strong redundancy effects on the risk of lung cancer, and rs28757157 had the information gain (2.22%) of individual attributes regarding the occurrence of lung cancer. Table 6 summarized that the most influential single-locus attributor for lung cancer risk was rs28757157 (testing balanced accuracy of 0.5503 and cross-validation consistency of 10/10). MDR analysis of gene-environment interaction also suggested that rs28757157 was the most influential single-factor attributor for lung cancer risk. Gender and smoking were found to be the most important environmental factor affecting lung cancer susceptibility. In addition, the gene-environment interaction model, composed of rs28757157, rs3751591, gender, BMI, and smoke showed higher testing-balanced accuracy (0.601) and cross-validation consistency (9/10), indicating that this interaction model was a candidate gene-environment model in our population. Figure 3 exhibited a strong synergy effect of gene-environment interaction on lung cancer risk.

Discussion
In this study, the association of four SNPs in the CYP19A1 gene with the susceptibility to lung cancer in the Chinese Han cohort was assessed. Statistical and bioinformatics results highlighted the important roles of rs28757157, rs3751592, and rs59429575 in the outset of lung cancer in the total or stratified population, which helped improve our understanding of CYP19A1 in this disease. CYP19A1 gene, encoding aromatase and responsible for the final step in the biosynthesis of estrogens, estradiol (E2) and estrone (E1), has been intensively investigated [25,26]. It has been identified that SNPs in the intron region of CYP19A1 play an important role in the transcriptional regulation and splicing of CYP19A1 and could produce some different enzymes with diverse enzyme activity compared with normal gene products [27]. The allele frequency of several CYP19A1 SNPs have been documented in different populations and ethnic groups around the world. SNPs in CYP19A1 were found to be associated with cancer risk [28]. In particular, CYP19A1 SNPs have been shown to be significantly associated with lung-related diseases.
A previous study has shown that SNP rs3764221 is significantly correlated with CYP19A1 expression in non-cancerous lung tissues and affects the susceptibility to lung adenocarcinoma. The authors suggested that CYP19A1 polymorphisms may lead to elevated levels of local estrogen surrounding the lungs, and this excess local estrogen production may be one of the factors associated with the polycentric development of adenocarcinoma [13]. The recent result has suggested that CYP19A1 polymorphism is involved in lung bronchioloalveolar    carcinoma and atypical adenomatous hyperplasia by causing differences in estrogen levels [29]. It is clear that CYP19A1 polymorphism may cause changes in estrogen levels around the lungs, which in turn can affect the susceptibility of lung cancer. Our results firstly revealed an association between rs28757157 and increased risk of lung cancer in the genotype, dominant, and additive models. In bioinformatic analysis, results from HaploReg v4.1 database displayed that rs28757157 may be associated with enhancer histone marks, motifs changed, and selected eQTL hits [30]. Based on the QTLbase database, the genotypes of CYP19A1 rs28757157 (p = 6.610e − 5) were related to the mRNA expression of CYP19A1 in the lungs [31]. These results suggested that CYP19A1 rs28757157 may be involved in the carcinogenicity of lung cancer by affecting the expression or function of CYP19A1, which requires further experimental confirmation. Notably, the demographic characteristics (age, gender, BMI, smoking, and drinking) might influence the genetic association on the occurrence of lung cancer [32]. Our research showed that CYP19A1-rs28757157 was associated with increased cancer risk in the population aged under 60 years, females, smokers, and drinkers. Besides, rs3751592 and rs59429575 were also identified as risk biomarkers in the population aged under 60 years and drinkers. These results indicated that the risk association of these polymorphisms might be age-, sex-, smoking-, and drinking-dependent, and genebehavioral habit interactions might operate in the pathogenesis of lung cancer.
These SNPs are located in the intron region of the CYP19A1 gene. Combined with previous studies and database predictions, we speculated that CYP19A1 intron SNPs may alter mRNA splicing, thereby leading to changes in the activity of CYP19A1 and related estrogens, and may affect disease susceptibility. Since the statistical significance of the correlation between CYP19A1 gene polymorphisms and the risk of lung cancer is slightly weak, further experimental studies are needed to verify the results of this study.
Furthermore, the correlation between CYP19A1 polymorphisms and lung cancer risk in different groups (tumor type, LNM, and stage) was further assessed. Stratified analysis by tumor type demonstrated a relationship between enhanced risk of squamous cell carcinoma and rs28757157, while rs3751592 CC genotype was identified as a risk factor for lung adenocarcinoma development. These findings suggested that lung adenocarcinoma and squamous cell carcinoma may have different genetic pathological mechanisms, which need to be further confirmed.
Our study has several limitations. All subjects were enrolled from the same hospital and the limitations of sample selection may affect the accuracy of this